Skip to main content

Notes on Web Scraping

#programming

Separate web crawling and web scraping.

Use something like Urlbox to save the page to S3, get a webhook when ready to scrape and use something to scrape

Playwright seems to be useful tool in the web scraping stack

Pandas read_html() seems useful